iteration complexity
Confidence-Based Decoding is Provably Efficient for Diffusion Language Models
Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models for language modeling, allowing flexible generation order and parallel generation of multiple tokens. However, this flexibility introduces a challenge absent in AR models: the \emph{decoding strategy} -- which determines the order and number of tokens generated at each iteration -- critically affects sampling efficiency. Among decoding strategies explored in practice, confidence-based methods, which adaptively select which and how many tokens to unmask based on prediction confidence, have shown strong empirical performance. Despite this success, our theoretical understanding of confidence-based decoding remains limited. In this work, we develop the first theoretical analysis framework for confidence-based decoding in DLMs. We focus on an entropy sum-based strategy that continues unmasking tokens within each iteration until the cumulative entropy exceeds a threshold, and show that it achieves $\varepsilon$-accurate sampling in KL divergence with an expected number of iterations $\widetilde O(H(X_0)/\varepsilon)$, where $H(X_0)$ denotes the entropy of the target data distribution. Notably, this strategy yields substantial sampling acceleration when the data distribution has low entropy relative to the sequence length, while automatically adapting to the intrinsic complexity of data without requiring prior knowledge or hyperparameter tuning. Overall, our results provide a theoretical foundation for confidence-based decoding and may inform the design of more efficient decoding strategies for DLMs.
- Asia > China > Hong Kong (0.04)
- North America > United States (0.04)
ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization
Alternating direction method of multipliers (ADMM) has received tremendous interest for solving numerous problems in machine learning, statistics and signal processing. However, it is known that the performance of ADMM and many of its variants is very sensitive to the penalty parameter of a quadratic penalty applied to the equality constraints. Although several approaches have been proposed for dynamically changing this parameter during the course of optimization, they do not yield theoretical improvement in the convergence rate and are not directly applicable to stochastic ADMM. In this paper, we develop a new ADMM and its linearized variant with a new adaptive scheme to update the penalty parameter. Our methods can be applied under both deterministic and stochastic optimization settings for structured non-smooth objective function. The novelty of the proposed scheme lies at that it is adaptive to a local sharpness property of the objective function, which marks the key difference from previous adaptive scheme that adjusts the penalty parameter per-iteration based on certain conditions on iterates. On theoretical side, given the local sharpness characterized by an exponent $\theta\in(0, 1]$, we show that the proposed ADMM enjoys an improved iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} in the deterministic setting and an iteration complexity of $\widetilde O(1/\epsilon^{2(1-\theta)})$ in the stochastic setting without smoothness and strong convexity assumptions. The complexity in either setting improves that of the standard ADMM which only uses a fixed penalty parameter. On the practical side, we demonstrate that the proposed algorithms converge comparably to, if not much faster than, ADMM with a fine-tuned fixed penalty parameter.
Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)
In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\tilde O(1/\epsilon^{1-\theta})$ with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.
- Asia > Middle East > Saudi Arabia (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- North America > Canada (0.04)
- (5 more...)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (2 more...)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- (11 more...)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.04)